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Abstract 

An  important  purpose  in  pooling  time  series  and 
cross  section  data  is  to  control  for  individual-specific 
unobservable  effects  which  may  be  correlated  with  other 
explanatory  variables:  e.g.,  latent  ability  in  measuring  returns 
to  schooling  la  earnings  equations  or  managerial  ability  In 
measuring  returns  to  scale  in  firm  cost  functions.  Using 
instrumental  variables  and  the  time-invariant  characteristic 
of  the  latent  variable,  we  derive 

1)  a  test  for  the  presence  of  this  effect  and  for  the  over- 
identifying  restrictions  we  use; 

2)  necessary  and  sufficient  conditions  for  identification  of 
all  the  parameters  in  the  model;  and 

3)  the  asymptotically  efficient  instrumental  variables  estimator 
and  conditions  under  which  it  differs  from  the  within-groups 
estimator. 

We  calculate  efficient  estimates  of  a  wage  equation  from  the 

Michigan  Income  dynamics  data  which  indicate  substantial  differences 

from  within-groups  and  Balestra-Nerlove  estimates  -  particularly 

a  significantly  higher  estimate  of  the  returns  to  schooling. 
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1.  Introduction 

An  important  benefit  from  pooling  time  series  and 
cross  section  data  is  the  ability  to  control  for 
individual-specific  effects  -  possibly  unobservable  -  which 
may  be  correlated  with  other  included  variables  in  the 
specification  of  an  economic  relationship.  Analysis  of 
cross  section  data  alone  can  neither  identify  nor  control 
for  such  individual  effects.  A  specification  test  proposed 
by  Hausman  (1978)  and  subsequently  used  in  a  number  of 
applied  contexts  has  indicated  that  correlated  individual 
effects  may  be  present  in  many  econometric  applications  to 
individual  or  firm  data.  1 

The  traditional  technique  to  overcome  this  problem 
has  been  to  eliminate  the ^individual  effects  in  the  sample 
by  transforming  the  data  into  deviations  from  individual 
means.   However,  the  least  squares  coefficient  estimates 
from  the  transformed  data,  (which  are  known  as  "within- groups" 
or  "fixed  effects"  estimates),  have  two  important  shortcomings: 
(1)  all  time  invariant  variables  are  eliminated  by  the 
transformation  so  that  their  coefficients  cannot  be  estimated, 
and  (2)  under  certain  circumstances,  the  within-groups 
estimator  is  not  fully  efficient  since  it  ignores  variation 
across  individuals  in  the  sample.  The  first  problem  is 


This  technique  corresponds  to  Model  I  of  the  analysis  of 
variance,  e.g.,  Scheffe  (1959) •  wnen  used  in  analysis  01 
covariance,  errors  in  measured  variables  can  create  a  serious 
problem  since  they  are  exacerbated  by  the  data  transformation. 
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usually  the  more  serious,  since  in  many  applications, 
primary  interest  is  attached  to  the  unknown  coefficients 
of  these  variables,  e.g.,  to  the  coefficient  of  schooling 
in  a  wage  equation  specification. 

To  consider  a  specific  model,  let 

(1.1)    Yit  =  XltB  +  ZiY  +  ajL  +  nlt    i=l,...,N;  t=l,...,T 

where  3  and  y  are  k  and  g  vectors  of  coefficients  associated 

with  time-varying  and  time-invariant  variables,  respectively. 

The  disturbance  n.,  is  assumed  to  be  uncorrelated  with  the 

it 

columns  of  (X,Z,a)  and  has  zero  mean  and  constant  variance  a2 

n 

conditional  on  X.,  and  Z.  .  The  unobservable  individual  effect, 

it      i 

a.,  is  assumed  to  be  a  time-invariant  random  variable, 
distributed  independently  across  individuals. 

The  primary  focus  of  this  paper  involves  the  potential 
correlation  of  a.  with  the  columns  of  X  and  Z.  If  such  correl- 
ations are  present,  least  squares  (OLS)  or  generalized  least 

squares  (GLS)  will  yield  biased  and  inconsistent  estimates  of  both 
B  and  y  Transforming  the  data  into  deviations  from  individual 

means  eliminates  the  correlation  problem  by  eliminating  the 

time-invariant  a.;  unfortunately,  at  the  same  time,  it 

eliminates  the  Z  ,  precluding  estimation  of  y.    Another 

possible  approach  is  to  find  instruments  for  those  columns 

of  X  and  Z  which  are  considered  potentially  correlated  with 

Oj   and  perform  instrumental  variables  estimation  on  equation 
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(1.1)  or  on  a  single  cross-section.  But  It  may  be  difficult 
or  impossible  to  find  appropriate  instruments,  excluded 
from  equation  (1.1),  which  are  not  correlated  with  a  .  For 
instance,  use  of  family  background  variables  as  instruments 
for  schooling  in  a  wage  equation  seems  unlikely  to  eliminate 
bias,  since  the  unobserved  individual  effect  is  likely  to  be 
correlated  with  measures  of  family  background. 

Specifications  similar  to  equation  (1.1)  have  been 
used  in  at  least  two  empirical  contexts.  If  equation  (1.1) 
represents  a  cost  or  production  function  and  a.  denotes  the 
unobservable  managerial  efficiency  of  the  i'th  firm,  Mundlak 
(1961)  has  suggested  the  use  of  the  within-groUps  estimator 
to  produce  unbiased  estimates  of  the  remaining  parameters.  If 
Y.,  denotes  the  wage  of  the  i'th  individual  in  the  t'th  time 
period,  one  of  the  Z!s  measures  his  schooling,  and  a.  denotes 
the  unmeasureable  component  of  his  initial  ability  or  ambition, 
then  equation  (1.1)  represents  a  specification  for  measuring 
the  returns  to  education.  To  the  extent  that  unmeasureable 
ability  and  schooling  are  correlated,  the  OLS  estimates  are 
biased  and  inconsistant.  Griliches  (1977)  has  relied  on  an 
instrumental  variables  approach,  using  family  background 
variables  excluded  from  equation  (1.1)  as  instruments.  Another 
approach  is  the  factor  analysis  model,  pioneered  in  this 
context  by  Joreskog  (1973)  and  applied  to  the  schooling 
problem  by  Chamberlain  and  Griliches  (1975)  and  Chamberlain 
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(1978).  The  factor  analysis  approach  relies  for  identification 
upon  orthogonality  assumptions  which  must  be  imposed  on 
observable  and  unobservable  components  of  a. .  The  method 
presented  in  this  paper  does  not  assume  a  specification  of 
the  components  of  a.  and  may  be  less  sensitive  to  our  lack 
of  knowledge  about  the  unobservable  individual-specific 
effect. 

Insteadj   our  method  uses  assumptions  about  the  correlations 
between  the  columns  of  (X,Z)  and  a. .  If  we  are  willing  to 
specify  which  variables  among  the  included  right  hand  side 
variables  of  equation  (1.1)  are  uncorrelated  with  the  individual 
effects,  conditions  may  hold  such  that  all  of  the  g's  and  y's 
may  be  consistently  estimated.  By  combining  the  unbiased 
wi thin-groups  estimates  of  the  3's  with  the  biased  between- 
groups  estimates  of  the  3's  and  y's,  adjustments  can  be  made 
which  produce  consistent  estimates  of  y   and  more  efficient 
estimates  of  3.  An  alternative  approach  which  uses  these 
assumptions  observes  that  the  columns  of  X.,  which  are  un- 
correlated with  a.  can  serve  two  functions  because  of  their 

l 

variation  across  both  individuals  and  time:  (i)  using 
deviations  from  individual  means,  they  produce  unbiased 
estimates  of  the  3's,  and  (ii)  using  the  individual  means, 
they  provide  valid  instruments  for  the  columns  of  Z   that 
are  correlated  with  a. . 

One  needs  to  be  quite  careful  in  choosing  among  the 
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columns  of  X.,  for  those  variables  which  are  uncorrelated 
it 

with  a...    For  instance,  in  our  returns  to  schooling  example,  it 
may  be  safe  to  assume  that  health  status  and  age  are  uncorrelated 
with  a. ,  but  one  might  be  reluctant  to  assume  that  unemployment 
and  a.  were  uncorrelated.  An  important  feature  of  our  method  is 
that  in  certain  circumstances,  the  non-correlation  assumptions 
can  be  tested,  so  that  the  method  need  not  rely  totally  on  a 
priori  assumptions. 

The  plan  of  the  paper  is  as  follows.  In  Section  2,  we 
formally  set  up  the  model  and  consider  estimates  proposed  in 

the  literature  for  cases  in  which  a.  is  uncorrelated  or 

1 

correlated  with  some  of  the  independent  variables.  In  the  latter 

case,  we  propose  a  consistent  but  inefficient  estimator  of  all 

the  parameters  in  the  model.  In  Section  3>    we  discuss  a 

variety  of  tests  which  determine  when  such  correlations  may  be 

present,  generalizing  results  of  Hausman  (1978).  In  Section  4, 

we  find  conditions  under  which  the  parameters  are  identified 

and  develop  an  efficient  instrumental  variables  estimator  that 

accounts  for  the  variance  components  structure  of  the  model. 

We  derive  a  test  of  the  correlation  assumptions  necessary  for 

identification  and  estimation,  applying  results  from  Hausman 
and  Taylor  (1980).  Section  5  connects  our  results  with 

Mundlak's  (1978)  paper  and  derives  Gauss-Markov  properties 

of  our  estimator  in  special  cases.  Finally,  in  Section  6, 

we  apply  the  procedure  to  an  earnings  function,  focusing 

on  the  returns  to  schooling.  These  results  indicate  that  when 
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the  correlation  of  a±   with  the  independent  variables  is 
taken  into  account,  traditional  estimates  of  the  return  to 
schooling  are  revised  markedly. 


2.   Preliminaries 


2.1  Conventional  Estimation 


We  begin  by  developing  the  model  in  equation  (1.1) 
slightly  and  examining  its  properties  in  the  absence  and 
presence  of  specification  errors  of  the  form  E(a. |X.,  ,Z.  )  ^   0 

f  1  1T>  1 


Let 


(2.1) 


Yit =  xite +  zs  +  eit 


eit  =  ai  +  nit 


where  we  have  reason  to  believe  that  E(e . ,  |X. ,  ,  Z.  )  = 
E(a.|X.fc ,Z.)  ^  0.   That  is,  some  of  the  measured  variables 
among  the  X. .  and  the  Z.  are  correlated  with  the  unobser- 
vable  individual-specific  effects  a..   It  will  prove  con- 
venient to  distinguish  columns  of  X  and  Z  which  are  asympto- 
tically uncorrelated  with  a.  from  those  which  are  not . For  fixed  T, 

let 

1      '  1      ' 

plim  jj  X11.a1   =   0,  plim  tt  Z    .a.    =   0, 

J\J-»-  00  ""  /  N->  00  ~ 

(2.2) 

1   '  1   ' 

plim  jj-  X2i„ai  =  hx  /   0,  and  plim  ^  ^2lal   =  hz  ^  2 
jsT— >-  co  N-*-  «> 

where  X..  =  [X,., :X2..],  Z.  =  [^f-ZoJi  and  the  dimensions 
of  X  and  Z  are  TNxk  =  [TNxk-^TNxkg]  and  TNxg  =  [TNxg1  :TNxg2] 
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respectively.  Note  that,  somewhat  unconventionally,  X   and 

Z.  denote  matrices  whose  subscripts  indicate  variation  over 

individuals  (i=l,...,N)  and  time  (t=l,...,T).  Observations 

are  ordered  first  by  individual;  a.  and  each  column  of  Z  are 

thus  TN  vectors  having  T  identical  entries  for  each  i=l,...,N. 
We  are  thus  assuming  that  k~  columns  of  X.,  and  g? 

columns  of  Z.  are  correlated  (asymptotically)  with  the  time- 
invariant  unobservable  a.:   E(a.|X., ,Z.)  ?   0.   Implicitly, 
we  are  also  assuming  that  there  are  no  other  observable 
exogenous  variables  which  -  along  with  the  X..  and  Z.  - 
could  enable  us  to  write  E(a.|X. ,,Z.)  as  a  linear  function 
of  observables  plus  an  orthogonal  error.   In  addition,  we 
assume  no  knowledge  of  other  relationships  in  which  the 
unobservable  a.  enters  in  a  similar  or  known  fashion.   In 
sum,  we  are  thinking  of  a.  as  an  inherently  immeasurable 
individual-specific  effect  about  which  we  have  only  the 
prior  information  embodied  in  equations  (2.1)  and  (2.2). 
Operationally,  this  means  that  we  cannot  obtain  a  consis- 
tent estimate  of  the  conditional  mean  of  Y. ,  from  avail- 
able observable  data  without  further  assumptions  regarding 
the  relative  magnitudes  of  (k  ,k„,g, ,g„) . 

To  derive  consistent  and  efficient  estimators  for 
($,y)  in  equation  (2.1),  it  will  be  helpful  to  recall  the 
menu  of  appropriate  estimators  in  the  absence  of  misspecifi- 
cation.  If  we  let  iT  denote  a  T  vector  of  ones,  two  convenient 
orthogonal  projection  operators  can  be  defined  as 


pv  = 


1     ' 

NT  1TtT 


QV  :  INT  "  PVJ 


which  are  idempotent  matrices  of  rank  N  and  TN-N  respec- 
tively.  With  data  grouped  by  individuals,  P„  transforms 
a  vector  of  observations  into  a  vector  of  group  means: 
i.e.,  PyYit  =  ?p  /      Y.   -  Y..   Similarly  Qy  produces  a 
vector  of  deviations  from  group  means:   i.e.,  Q-y^it  =  Y-+- 
Y..  -  Y.  .   Moreover,  Qv  is  orthogonal  by  construction 
to  any  time-invariant  vector  of  observations:   QyZ^  = 


:i-fEzi  =  9 


t=i 


Transform  model  (2.1)  by  Qv,  obtaining 


QvY±t  =  QyXlt8  +  QvZ.y  +  Qyai  +  Qvnlt 
which  simplifies  to 


(2.3)   .        Yit  -  xltp  +  nlt 


Least  squares  estimates  of  3  in  equation  (2.3)  are  Gauss- 
Markov  (for  the  transformed  equation)  and  define  the 
within-groups  estimator 


a      i       _i  »  ~  i  ~   N-i"" '  ~ 

BW  =  (XitQVXit}   XitQVYit  =  (XitXit}   XitYit 


Since  the  columns  of  X..  are  uncorrelated  with  ru^*  3W  is 
unbiased  and  consistent  for  g  regardless  of  possible  corre- 
lation between  a.  and  the  columns  of  X.,  or  Z. .   The  sum 
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of  squared  residuals  from  this  equation  can  be  used  to 

2 
obtain  an  unbiased  and  consistent  estimate  of  a  ,  as  we 

shall  see  shortly.  As  pointed  out  in  the  Introduction,  this  within' 
groups  estimator  has  two  serious  defects:   (i)  it  ignores 

between  group  variation  in  the  data,  and  (ii)  the  trans- 
formation Qv  eliminates  time-invariant  observables  such 
as  Z.. 

To  make  use  of  between-group  variation,  trans- 
form model  (2.1)  by  Py,  obtaining 

Vit  =  pvxitp  +  Pvzi^  +  Vi  +  Vit 


or 

(2.4)        Y1<  =  X1#3  +  ZiY  +  a1  +  n± .  . 

Least  squares  estimates  of  (3  and  y  in  equation  (2.4)  are 

known  as  between-groups  estimators  (denoted  $B  and  yb) 

and  because  of  the  presence  of  a. ,  both  £R  and  yr  are  biased 

and  inconsistent  if  E(a. |x.. ,Z.)  ^  0.   Similarly,  the  sum 

of  squared  residuals  from  equation  (2.4)  provides  a  biased 

2    12 
and  inconsistent  estimator  for  Var  (a.  +tk.  )  =  cr  +  m  o     when 

E(ai|Xlt,Z1)  ?   0. 

In  the  absence  of  misspecification,  the  optimal 

use  of  within  and  between  groups  information  is  a  straight- 
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forward  calculation.   Let 


Yit  -  xite  +  V  +  £it 

where  E(e  . ,  Jx..  ,Z. )  =  0  and  covCe.^)  =  ft  =  0.*™  + 

0at:EN  ®  lTlT^  =  crnITN  +  TaaPV5  a  famlliar  block-diagonal 
matrix.   Observe  that  the  problem  is  merely  a  linear 
regression  equation  with  a  non-scalar  disturbance  covar- 
iance  matrix.   Assuming  a.  and  n.,  to  be  normally  distri- 
buted, it  is  easy  to  show  that  the  within  and  between 
groups  coefficient  estimators  and  the  sums  of  squared 

residuals  from  equations  (2.3)  and  (2.M)  are  jointly 

2   2 
sufficient  statistics  for  (B,y ,0^,0^) .   The  Gauss -Markov 

estimator,  then,  is  the  optimal  matrix-weighted  average  of  the 

between  and  within  groups  estimators,  where  the  weights 

2      2 
depend  upon  the  variance  components  a  and  cr   and  are 

chosen  to  min  var (3glsYqLS) '=  AVgA»  +  (I-A)VW(I-A) » , 

where  VR,  V\,  denote  the  covariance  matrices  of  the 

between  and  within  groups  coefficient  estimator.   The 

solution  can  be  written  as 


=  A      +  (I-A) 


(see,  e.g.,  Maddala  (1971)),  where 
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2      2  1-1 

A  =  {(X:Z)'PV(X:Z)  +  -D — § — 5L  (X:Z) 'Qy(X:Z)|   (X:  Z) 'Py(X:  Z  ) 

a 
n 


"  'W^V 


This  Is  frequently  known  as  the  Balestra-Nerlove  estimator; 

2      2 
it  requires  knowledge  of  the  variance  components  a  and  a 

but  one  can  substitute  consistent  estimates  for  the  variance 

2 

components  without  loss  of  asymptotic  efficiency.    Observe 

that  if  E(a  |x.,,Z. )  ^  0,  these  Gauss-Markov  estimators 
will  be  biased  and  if  h  /  0  and  h  ?   0,  they  will  be  incon- 
sistent,  since  they  are  matrix-weighted  averages  of  the 
consistent  within-groups  estimator  and  the  inconsistent 
between-groups  estimator. 

For  both  numerical  and  analytical  convenience,  we 
can  express  the  Gauss-Markov  estimator  in  a  slightly  dif- 
ferent form.   Nerlove  (1971  )  shows  that  Q   has  two  distinct 

2     2  2 

eigenvalues,  a     +   To  of  multiplicity  N  and  a     of  multiplicity 

TN-N;  from  equations  (2.3)  and  (2.4),  it  follows  that  the  N 

and  TN-N  basis  vectors  spanning  the  column  spaces  of  Pv  and 

Qv  span  the  eigenspaces  of  Q   corresponding  to  the  eigenvalues 

2      2       2 

a     +  To     and  a  respectively.   Thus  if  we  weight  these  basis 


2  The  finite  sample  implications  of  this  substitution  are 
explored  in  Taylor  (1979). 
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vectors   by   0    = 


%/(%  +  Taa} 


U/2 


we  obtain  the  following. 


Proposition  2.1;   The  TN*TN  non-singular  matrix 


$T1/2  =  epv  +  qv  =  iTN  -  epv 


transforms  the  disturbance  a.  +  r\..    into  a  sequence  of 
independent  and  identically  distributed  random  variables. 
Proof:   Basis  vectors  of  the  column  spaces  of  Pv  and  Qv 
can  be  chosen  to  diagonalize  ft.   To  make  the  resulting 
matrix  scalar,  it  is  necessary  to  multiply  P„  by  the  square 
root  of  the  ratio  of  the  two  distinct  eigenvalues: 


ft  1/2  ft  ft~1/2  =  [6PV+QV] 


^W?«pv  [eVQv] 


=   62(a2+Ta2)P,r  +   a2Q,7  =   a2ImM. 

n      a    v        n  v        n  tn 


Alternatively,   note   that 


fi"V2eit  =  CiTN-(i-e)Pv](a1+nlt) 


=  o1  -  Ci-6)ai  +  nit  -  (i-e)ni.  =  e(a1+ni.)  +  nlt; 


since  the  last  two  terms  are  orthogonal, 


cov(ft"1/2els,ft"1/2ejt) 


0   s^t  or  i^j 


=  a   s=t  and  i=j 
n 
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-1/2 
We   can  then  p.remultiply  equation    (2.1)   by  fi   '       ,   or  - 

equlvalently  -   transform  the   data-  so   that 


+   ^1/2ZlY  +   «-1/2a1   +   ^1/2nlt,    or 


n  --Ylt  =  a        xlte 


(2.5) 

Yit  "  (1_e)Yi.  =  Cxit-(i-e)xlp]B  +  ez±y  +  ea±  +  nit  -  (i-e)ni.   . 


Least  squares  estimates  of  (B,y)  In  equation  (2.5)  are 
Gauss-Markov,  provided  E(a.|X.,,Z. )  =  0.   If  misspeclflca- 
tion  is  present,  the  fact  that  a.  appears  In  equation  (2.5) 
means  that  the  GLS  estimates  will  be  inconsistent. 
2.2  -Consistent  But  Inefficient  Estimation 

Despite  correlation  between  the  unobservables  and 
the  observable  explanatory  variables,  we  saw  in  Section  2.1 
that  Bw  is  unbiased  and  consistent  for  B  but  makes  no  use 
of  between-group  variation  in  the  data.   Furthermore, 
QyZ.y  =  0,  so  that  it  appears  impossible  to  obtain  an 
estimate  of  y  from  the  within-group  data.   Under  appro- 
priate assumptions  about  k-,  and  g?,  (the  number  of  exogenous 
X's  and  endogeneous  Z's),  it  is  possible  to  obtain  consistent 
estimates  of  y»  using  the  residuals  from  the  within-groups 
regression.  Let 

dit  =  -it  "  XitBW  =   {INT"XitCXitQVXit)   XitQV  }  Yit 
be  the  TN  vector  of  group  means  estimated  from  the  within- 
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group  residuals.  This  simplifies  to 

(2.6)  d.t  -  ziY  +  a±   +  U^-x.^X!^)"1^}  n±t 

Treating  the  last  two  terms  as  an  unobservable  disturbance, 

consider  estimating  y  in  equation  (2.6).  Since  a  is  correlated 

with  the  columns  of  Z„  ,  both  OLS  and  GLS  will  be  inconsistent 

for  y.  Consistent  estimation  is  possible  however,  if  the 

columns  of  X    -  uncorrelated  with  a.  by  assumption  - 

provide  sufficient  instruments  for  the  columns  of  Z_. 

which  are  correlated  with  a. .  A  necessary  condition 

1 

for  this  -  and  thus  for  the  identification  of  y  in. equation 
(2.6)  -  is  clearly  that  k-,  >_  g? :   that  there  be  at  least  as 
many  exogenous  time-varying  variables  as  there  are  endogenous 
time-invariant  variables.   We  shall  return  to  the  question  of 
identification  of  (3,y)  from  equation  (2.1)  in  the  next 
section. 

The  2SLS  estimator  for  y  in  equation  (2.6)  is 


(2.7)  YW  -  (<Vi'"lziVit 
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where  W  denotes  the  instruments  Xn  and  Zn  ,  and  P..  is  the 

1      1'      W 

orthogonal  projection  operator  onto  the  column  space  of  W. 
The  sampling  error  is  given  by 

yw  -  r  -  (z'Pwz)-1z'Pw[v  [%-*i.  tfiAt^it^it]' 

and  under  the  usual  assumptions  governing  the  X  and  Z  processes, 

the  2SLS  estimator  is  consistent  for  y»  since  for  fixed  T, 

plim  k   W'a.  =  0  and  plim  J  Xl.n.x.  =  0.  The  fact  that  the 
*  N    l    ^     ^    N   it  'it    % 

d..  are  calculated  from  the  within-group  residuals  suggests 
1  x> 

that  if  By  is  not  fully  efficient,  then  yw  in  equation  (2.7) 
may  not  be  fully  efficient. 


Having  consistent  estimates  of  3  and  -  under 

certain  circumstances  -  y,   we  can  construct  consistent 

2       2 
estimators  for  the  variance  components  a  and  a    .      First, 

o 
a  consistent  estimate  of  a  can  always  be  derived  from  the 

within-group  residuals;  i.e.,  from  the  least  squares  resi- 
duals from  equation  (2.3).   If  "Q£  denotes  ITN  -  X±t  (X^X^  )"1X^t , 
we  can  write  the  sum  of  squares  of  within-group  residuals  as 


^tQx*it  =  ^tQx^it  =  Vit  "  n^xCx-xT1*'^ 
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so   that    if   S2   =   ^yY^Q-Y.^ 


^m  s' =  £iim  ^fcir  ^tSFit 


pi*™  Nifcry  \tnit  -  °  =  mm  wtkij  nitVit 


-»5 


since  rank  (Qy)  =  N(T-l). 

Finally,  whenever  we  have  consistent  estimators 

2 
for  both  3  and  y3    a  consistent  estimator  for  a     can  be 

obtained.   Let 


/\  A 


°2  -  \  (V-VvW^i.-VvW; 


then 


plim  52  =  plim  \   (YjL>-X1^-Z1Y),(Y1.-Xi.3-Z.Y) 


N-^-oo       N-*-00 


=  plim  sr  (ot.+n.. ) f  Cot1+ni. ) 


„2  4.  1  ~2 


2    "2    1   2  2 

so  that  s  =  a  -  m  s   is  consistent  for  a  , 
a        l   n  a 
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3.   Specification  Tests  Using  Panel  Data 

A  crucial  assumption  of  the  cross-section  regres- 
sion specification  Y.  =  X.3  +  e.  (i  =  1,...,N)  is  that  the 
conditional  expectation  of  the  disturbances  given  knowledge 
of  the  right  hand  side  variables  is  zero:   E(a. |X. )  =  0. 
A  great  advantage  of  panel  data  is  that  following  the  cross- 
section  panel  over  time  allows  a  test  of  this  hypothesis. 
To  derive  such  a  test,  we  consider  the  random  effects 
specification  of  equation  (1.1),  including  the  time- 
invariant  Z.  among  the  X..  for  notational  convenience: 

Yit  =  xitp  +  ai  +  nit   (i  =  13---JN;t  =  i,...,t). 

C3.D 

The  unobservable  disturbance  has  been  broken  into  two  terms, 

the  first  of  which  reflects  unobservable  individual  char- 
acteristics unchanging  over  time  which  are  not  represented 
in  X. ,  $.   The  r\.,    are  random  shocks  which  we  assume  to  be 
orthogonal  to  a.  and  the  X. .  . 

1  1  I 

The  specification  tests  which  we  consider  test 
the  null  hypothesis 

HQ:   E(c^fXlt)  =  0, 

against  the  alternative  that  E(a.jX., )  ?   0.   If  HQ  is 
rejected  we  might  try  to  reformulate  the  cross-section 
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specification  in  the  hope  of  finding  a  model  in  which  the 
orthogonality  property  holds.   Alternatively,  we  might  well 
be  satisfied  with  using  an  estimator  which  permits  consis- 
tent estimation  of  the  slope  parameters  by  controlling  for 
the  correlation  between  a.  and  X. ,  .   An  asymptotically 
efficient  procedure  for  doing  this  is  outlined  in  the 
latter  half  of  this  paper. 

Recall  the  three  estimators  for  3  in  equation 
(3.1)  -  3Ws  3Bs  3GLc  -  which  we  discussed  in  the  previous 
section.   Since  these  estimators  have  different  properties 
under  the  null  and  alternative  hypotheses,  we  are  led 
naturally  to  form  three  different  specification  tests. 
CD   GLS  vs.  within.   Under  the  null  hypothesis,  3GLc  is 
efficient,  while  under  the  alternative,  it  is  inconsistent. 
3^  is  consistent  under  both,  but  inefficient.   Consider 
the  vector 


ql  =  ^GLS  "  V 


Under  Hn,  plim  q-,  =  0,  while  under  H,  ,  plim  q1    ^   0,  since 
plim  Bqtq  ¥   3  =  plim  3„.   Hausman  (1978)  showed  that 

A.  A.  As.  O 

var(q,)  =  var(3w)  -  var^QLs^  so  a  X  test  is  easily 
formed.   This  test  has  been  used  fairly  frequently  and 
has  appeared  to  be  quite  powerful. 
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(2)   GLS  vs.  between.   Under  HQ,  BB  is  inefficient  while 
under  the  alternative  hypothesis  it  is  inconsistent  and 
plim  3n  7*   plim  3PT  „  ^  3.   Thus  deviations  of  the  vector 


q2  =  3GLS  "  h 


from  the  zero  vector  cast  doubt  upon  the  null  hypothesis. 

AAA 

Using  Hausman's  (1978)  results,  var(q~)  =  var(3R)  -  var(3GLS) 

which  gives  rise  to  another  chi-square  statistic. 

(3)   Within  vs.  between.   As  we  have  seen,  under  H„, 

plim-3R  =  plim  3W  =  3,  whereas  under  the  alternative 

hypothesis,  plim  3W  =  3  f   plim  3R.   Also,  from  the  char- 
N->oo   w       n-^00 

aeterization  in  Section  2,  the  within  and  between  groups 

a  a 

estimators  lie  in  orthogonal  subspaces  so  that  3,,  and  3B 
are  uncorrelated.   Thus  if 


q3  =  ew  -  eB> 


var(q^)  =  var(3y)  +  var(3B),  and  a  third  chi-square  statis- 
tic is  available. 

In  considering  these  three  tests,  Hausman  (1978) 
conjectured  that  the  first  test  might  be  better  than  the 

a  a 

third  since  V(q2)  :>  V(q-,);  while  Pudney  (1979)  conjectured 
that  the  second  test  might  be  better  than  the  third  because 
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3GLS  is  efficient.-3  (Actually,  var(qO  >_var(q-,).) 

A  somewhat  surprising  event  occurs,  however,  if 
we  parameterize  the  relationship  between  a.  and  the  X. . . 
Using  the  specification  of  Mundlak  (1978)  that 


C3.3)  ot±  =  X1.ti  +  u± 


and  assuming  that  to.  and  r\..    are  independent  joint  normal 
leads  to  a  straightforward  maximum  likelihood  problem. 
Assuming  that  we  know  ft  for  simplicity,  what  we  might  call 
the  Holy  Trinity  of  statistical  tests  appears.   That  is, 
the  three  tests  outlined  above  correspond  to  the  likelihood 
ratio,  Lagrange  multiplier  (Rao  efficient  score),  and  Wald 
test  respectively.  ' 

This,  however,  creates  a  problem.   Assuming  ft  to 
be  known,  the  within  and  between  groups  estimators  are 
jointly  sufficient  for  8  so  that  no  other  information  should 
be  present  in  the  data.   In  addition,  we  know  the  likelihood 
ratio,   Lagrange  multiplier  and  Wald  tests  to  be  identical 
for  testing  linear  restrictions  on  linear  models  and  the 
null  hypothesis  E(a. |x. ,  )  =  0  corresponds  -  in  this  special 
case  -  to  the  linear  restriction  7T  =  0. 

"3 

Pudney  actually  considered  using  estimates  of  e . .  from  the 

three  estimators  and  then  basing  tests  upon  the  sample 
covariance  X'e,  using  either  the  within  or  GLS  estimate  of 
8  to  form  e.   However,  the  tests  are  considerably  simpler 
to  apply  by^directly  comparing  the  0's;  Pudney  was  not  aware 
that  using  Bpro  to  form  e  is  equivalent  to  the  second  test. 
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In  this  case,  it  is  evident  that  the  tests  must 

be  identical,  and  it  is  straightforward  to  demonstrate  this 

Identity  in  general.   Recall  that  3GLS  can  be  written  as  a 
matrix-weighted  average  of  3B  and  g,,, 

4 
where  the  weight  matrix  A  Is  non-singular.   While  this  form 

of  the  GLS  estimator  is  computationally  inconvenient,  it  is 

extremely  easy  to  derive  the  relationships  among  tests  (1-3) 

from  it.   Considering  the  tests  in  turn 


*1  =  6GLS  ~  3W  =  ^W  =  "^3 


and 


/s     ^ 


3GLS  "  3B  =  (I-^CV^B5  =  (I"A)q3 


so  that  the  three  tests  are  all  non-singular  transformations 
of  each  other.   Their  operating  characteristics  must  there- 
fore be  identical.   Indeed, 

Proposition  3-1-   The  chi-square  statistics  for  tests  (1-3) 
are  numerically  exactly  identical. 

Proof:   Recall  that  var(q\)  =  var(3B)  +  var(§w)  s  V   .      Then 
var(q1)  =  AV3A»  5  V1  and  var(q2)  =  (I-A)V3(I-A) '  =   Vgf   Thus 


4 

Note  that  if  a2   and  a2  are  unknown  and  must  be  estimated,  the 

a      n 
above  identity  holds  exactly  in  finite  samples,  using  the 

estimated  weight  matrix  A  . 
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q^q-L  =  q^A1[AV3A']_1Aq3  =  q^1^ 


and 


^2V2^2  =  q3(I-A)'L(I-A)V3(I-A)']  1Cl-A)q3  =  q^1^, 


Since  the  chi-square  statistics  which  define  the 
tests  are  identical,  it  makes  no  difference  which  test  is 
used.   Computationally,  the  first  test  might  be  preferable 
since  it  requires  calculation  of  only  those  estimators 
which  might  be  used  under  either  the  null  CGLS)  or  alterna- 
tive (within-groups )  hypothesis.   Note  that  Hausman  (1978) 
shows  that  this  test  -  and,  as  we  have  just  shown,  tests 
(2)  and  (3)  -  can  be  set  up  as  an  F  test  in  an  auxilliary 
regression  so  that  direct  calculation  of  the  quadratic  form 
is  unnecessary. 

The  important  result  in  this  section,  however,  is 
that  all  three  tests  are  identical.   Despite  intuition  and 
folk  wisdom  to  the  contrary,  it  makes  no  difference  which 
comparison  is  used  ,  in  testing  for  the  presence  of  correlation 
between  a.  and  the  columns  of  X   and  Z.  In  the  next  section, 
we  extend  these  specification  tests  to  determine  when  the 
prior  information  in  equation  (2.2)  -  upon  which  our  identificatio 
and  estimation  results  depend  -  is  in  agreement  with  the  data. 
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H.    Instrumental  Variables  Estimators 
l\.l   Identification 

In  this  section,  we  address  the  question  of  the 
identification  of  some  or  all  of  the  elements  of  (3,Y)  using 
only  the  prior  information  embodied  in  equation  (2.2)  and 
the  time-invar iance  characteristic  of  the  latent  variable 
ex..   Because  the  only  component  of  e..    which  is  correlated 
with  the  explanatory  variables  is  time-invariant,  any  vector 
that  is  orthogonal  to  a  time-invariant  vector  can  be  used  as 
an  instrument,  and  TN-N  linearly  independent  vectors  with 
this  characteristic  can  always  be  constructed.   Recall  from 
Section  2  that 


QV  :  INT 


1  » 

IN  ®  f  lTTT 


I    -  P 


is  an  idempotent  matrix  of  rank  TN-N  which  transforms  a 
TN  vector  into  deviations  from  individual  means.   Thus 
any  set  of  TN-N  basis  vectors  for  the  column  space  Qy 
is  orthogonal  to  any  time-invariant  vector.   In  particular, 
Q  a.  =  0,  from  which  we  conclude  that  there  are  always  at 
least  TN-N  instruments  available  in  equation  (2.1). 
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Unfortunately j  as  noted  in  the  introduction,  Qy 
is  also  orthogonal  to  Z.  which  violates  the  requirement 
that  instruments  be  correlated  with  all  of  the  explanatory 
variables.   We  thus  need  to  specialize  the  familiar  results 
on  identification  in  linear  models  to  identification  of 
subsets  of  parameters.   Consider  the  canonical  linear 
simultaneous  equations  model 

0.1)  Y  =  XB  +  e 

where  some  columns  of  X  are  endogenous  and  the  matrix  Z 
contains  T  observations  on  all  variables  for  which 
plim  Z'e  =  0.   Consider  the  projection  of  equation  Ol.l) 
onto  the  column  space  of  Z:  * 

(*».2  )  PZY  =  PZX3  +  Pze. 

Now,  if  A  is  a  k  vector  of  known  constants, 

Lemma:   A  necessary  and  sufficient  condition  for  3  to  be 
identified  in  equation  (4.1)  is  that  every  linear  function 
A' 3  be  estimable  in  equation  (4.2). 

This  useful  result  follows  immediately  from  a  theorem  of  P. 
Fisher  (1966,  Theorem  2. 7.2,  p.  56)  which  implies  that  the 
parameters  of  a  structural  equation  are  identified  if  and 
only  if  the  two  stage  least  squares  estimator  is  well-defined, 
which  in  turn  is  equivalent  to  the  non-singularity  of  the 
matrix  X'PZX.   A  function  X'3  is  estimable  in  equation  (4.2) 
if  and  only  if  X»  lies  in  the  row  space  of  PZX  (Scheffe 
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(1959),  Theorem  1,  p.  13).   For  this  to  hold  for  any  A, 
P„X  must  be  of  full  column  rank,  which  completes  the  proof. 
Suppose  that  the  conditions  of  this  lemma  are  not 
attained,  as  occurs  in  equation  (2.1),  taking  elements  of 
the  column  space  of  Qv  as  exogenous.   Then 
Corollary :   A  necessary  and  sufficient  condition  for  a 
particular  (set  of)  linear  function(s)  A'3  to  be  identi- 
fied in  equation  (4.1)  is  that  A'3  be  estimable  in  equa- 
tion (4.2). 

Clearly,  if  A' 3  is  estimable  in  equation  (4.2),  then  it  is 
identified.   On  the  other  hand,  if  A' 3  is  identified  in 
(4.1),  it  has  a  consistent  estimator  a'Y  for  which 


plim  a'Y  =  plim  a'X3  +  plim  a'e  =  A'3 
T->oo        T-*-00         T-*-00 


for  all  3.   Thus  plim  a'e  =  0  so  that  a  lies  in  the  column 
space  of  Z,  the  set  of  all  exogenous  variables.   Hence 
a  =  P„b  for  some  T  vector  b  and 

plim  a'X3  =  plim  b'PzX3  =  A'3 
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for  all  3)  so  that  A'  lies  In  the  (asymptotic)  row  space 
of  PZX.   By  the  previously  cited  theorem  of  Scheffe,  A' 3 
is  thus  (asymptotically)  estimable  in  equation  (H .  2) , 
completing  the  proof. 

Returning  to  the  question  of  identification  in 
equations  (2.1)  and  (2.2),  we  observe  that  even  if  none  of 
the  columns  of  X  or  Z  is  exogenous  (k,  =  g,  =  0),  all  of 
the  elements  of  3  are  identified:   Simply  project  equation 
(.2.1)  onto  the  column  space  of  all  the  exogenous  variables  - 
such  a  projection  operator  is  Qv  -  and  observe  that  all 
linear  functions  of  3  are  estimable,  since  X..Q.JC.,  is 
non-singular.   The  two  stage  least  squares  (2SLS)  estimator 
for  3  in  this  case  is 

32SLS  =  (XitQVXit}   XitQVYit  =  (XitXit}   XitYit  =  BW 

which  is  identical  to  the  within-groups  estimator.   In  this 

case  (k-,  =  g-.  =  0),  it  is  easy  to  verify  that  y  is  not  identified. 

If  prior  information  suggests  that  certain  columns 
of  X  and  Z  are  exogenous  (1^  >  0,  g1  >  0),  then  the  columns 
of  X-,..  and  Z,.  must  be  added  to  the  list  of  instruments. 


5 


TPhis  fact  underscores  the  importance  of  the  observation 
that  the  disturbance  in  equation  (2.6)  is  correlated  with 
Z.,  so  that  instruments  are  required  to  estimate  y. 
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Let  W  denote  the  matrix  [QV":X,  . .  :Z-.  .  ]  and  let  P.,  be  the 
orthogonal  projection  operator  onto  the  column  space  of  W. 
Then,  corresponding  to  the  familiar  rank  condition ,  we  have 
Proposition  4.1:   A  necessary  and  sufficient  condition  that 
the  entire  vector  of  parameters  (3,y)  be  identified  in  equa- 
tion C2.1)  is  that  the  matrix 


x« 


z± 


|..-pwcx.t  :  z  ) 


be  non-singular. 

Corresponding  to  the  order  condition,  we  have 
Proposition  4.2:   A  necessary  condition  for  the  identifica- 
tion of  (£,Y)  in  equation  (2.1)  is  that  k,  >_  gp. 
Proof:   The  first  proposition  is  a  simple  restatement  of  the 
earlier  Lemma.   Proposition  4.2  asserts  that  we  must  have  as 
many  (or  more)  exogenous  X's  as  we  have  endogenous  Z's:   a 
familiar  enough  requirement  from  the  instrumental  variables 
literature,  but  here  it  is  used  to  identify  an  otherwise 
unidentified  subset  of  the  parameters.   To  prove  it,  observe 
that  rank  [Pw(Xlt:Zi)]  <  rank  [PwXit]  +  rank  [Py^]  = 
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k  +  rank  [P„Z.]  so  that  a  necessary  condition  for  the  matrix 

in  Proposition  k.l   to  be  non-singular  is  that  rank  [P  Z. ]  =  g. 

w  1 

Since  Z.  is  orthogonal  to  Qv,  k,  >_  gp  is  necessary  for 
rank  [P..Z.]  to  equal  g,  which  completes  the  proof. 

This  discussion  of  identification  in  structural  models 
with  panel  data  has  revealed  a  few  noteworthy  features. 
First,  given  only  the  assumption  that  individual-specific 
unobservable  components  cause  some  explanatory  variables 
to  be  correlated  with  the  disturbance,  it  is  remarkable  to 
find  that  the  coefficients  of  the  time-varying  variables 
are  identified  while  those  of  the  time-invariant  observations 
are  not.  Second,  Mundlak  (1978)  has  shown  that  when  all  the 
columns  of  X  and  Z  are  correlated  with  a.,  (i.e.,  k  =g  =0), 
Bw  is  Gauss-Markov  for  B.  In  this  case,  the  2SLS  estimator 
coincides  with  the  within-groups  estimator  for  B  and  the 
components  of  y  are  not  identified. 

Finally,  identification  of  y   can  be  attained  by  finding 
additional  instruments  -  at  least  one  for  every  endogenous 
column  of  Z . .  Curiously,  the  k,  exogenous  columns  of  X  .. 
which  are  included  in  the  structural  equation  (2.1)  in  question, 
are  the  only  candidates  for  these  identifying  instruments. 
This  contrasts  with  the  conventional  simultaneous  equations 
model  in  which  excluded  exogenous  variables  -  such  as  family 
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background  in  the  traditional  measurement  of  the  return 
to  education  -  are  required  to  identify  and  estimate  the 
parameters  of  a  structural  equation.  Intuitively,  this  works 
because  only  the  time-invariant  component  of  the  error  is 
correlated  with  (X2>Z2).  Since  X1±t  =  X1±t  +  xlit»  xlit 
can  be  used  as  an  instrument  for  X^  and  X^  can  be  an 
instrument  for  Zp.. 

k.2     Estimation 

If  the  parameters  of  equation  (2.1)  are  identified 
by  means  of  a  specified  set  of  exogenous  variables  which  can 
be  used  as  instruments,  a  consistent  and  asymptotically  effi- 
cient estimator  for  (£5,y)  can  be  constructed.   Except  for  the 

fact  that  the  disturbance  covariance  matrix  var(e.,  )  =  fi  = 

2        2 

%ITN  +  T0aPV  is  non"scalarJ  equations  (2.1)  and  (2.2)  repre- 
sent an  ordinary  structural  equation  and  a  list  of  exogenous 
and  endogenous  variables  from  which  the  reduced  form  can  be 
calculated.   Thus  if  ft  were  known,  two  stage  least  squares 
(2SLS)  estimates  of  (3,y)  in 

(4.3)   «"1/2Yit  =  ft"1/2Xlt3  +  ft"1/2ZiY  +  ft"1/2elt, 

taking  as  exogenous,  Xlit  and  Z,.  would  be  asymptotically 
efficient,  in  the  sense  of  converging  in  distribution  to  the 
limited  information  maximum  likelihood  estimator. 
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Alternatively,  the  information  embodied  in  equa- 
tions (2.1)  and  (2.2)  can  be  written  as  a  single  structural 
equation  and  two  multivariate  reduced  form  equations: 


Yit  ■  xit6  +  V  +  £it 


2it   =  ^it^n  +  hl11^  +  Vl3  +  vl 


it 


Z2i  ~  Xlit7r21  +  ZliTr22  +  V23  +  V2it 

where  X,,  Z  ,  and  CL.  are  exogenous,  v,  and  v  are  correlated 

with  a.  and  thus  with  e.,  ,  and  tt__  =  0.  Transforming  the 
1  it'      23 

structural  equation  by  9-differencing  the  data,  we  can  rewrite 
the  system  as 

JT1/2Y..  =  $r1/2X..  B  +  fi"1/2Z.Y  +  JT1/2e,^ 
it         it  1  it 

(H.H)  X2±t     =     Xut7T11     +     ZUW12     +     Q^^     +     Vllt 

Z2i   =  Xlit1T21  +  Zli7T22  +  V23  +  V2it 


again  assuming  the  variance  components  -  and  thus  0,   -   to 
be  known. 
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This  system  represents  the  information  in  equa- 
tions (2.1-2.2)  in  a  form  convenient  for  discussing  effi- 
ciency of  estimators  for  $   and  y.   In  particular,  equations 
(4.4)  are  triangular  -  because  the  bottom  two  equations  are 
reduced  forms  -  but  not  recursive  -  because  v,  and  v  are 
correlated  with  a..   In  addition,  the  reduced  form  equations 
are  all  -  by  definition  -  just  identified.   Since  the  dis- 
turbance covariance  matrix  in  equations  (4.4)  is  unknown, 
the  results  of  Lahiri  and  Schmidt  (1978)  imply  that  OLS  is 
inconsistent  but  3SLS  is  fully  efficient.   Finally,  since 
the  reduced  forms  are  just  identified,  3SLS  estimates  of 
C$jY)  in  the  entire  system  are  identical  to  3SLS  estimator 
of  (3,y)  in  the  first  equation  alone  (Narayanan,  19&9) »    and 
these  are,  of  course,  just  the  2SLS  estimators.   Thus  2SLS 
estimates  of  ($,y)  in  equation  (4.3)  are  fully  efficient, 
given  the  prior  information  in  equation  (2.2),  in  the  sense 
that  they  coincide  asymptotically  with  FIML  estimators  from 
the  system  (4.4). 

Continue  the  assumption  that  Q   is  known.   2SLS 
estimates  of  B  and  y   in  equation  (4.3)  are  equivalent  to 
OLS  estimates  of  £  and  y   in 


V1/2Yit  -  V"1/2xitp  +  V"1/2zi^  +  V"1/2eit' 

(4.5) 
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where  Pw  is  the  orthogonal  projection  operator  onto  the 
column  space  of  Instruments  W  =  CX,  lt  :Z,  .  :CU]  .   Least  squares 

applied  to  this  equation  is  computationally  convenient: 

-1/2 

(i)   the  transformation  Q  can  be  done  by  differencing 

-1/2 
the  data,  since  Q         X.,  =  X.,  -  (1-0 )X.  y   where 


a2/(a2+Tc2) 
n   x]       a 


1/2 

,  as  shown  in  equation  (2.5), 


9  = 
(ii)   the  projection  of  the  exogenous  variables  onto 

the  column  space  of  W  yields  the  variables  them- 
selves, and 
Ciii)   the  projection  of  the  endogenous  variables  onto 

r 

the  column  space  of  W  can  be  calculated  using 
only  time  averages,  rather  than  the  entire  TN 
vectors  of  observations,  as  shown  in  section  six. 
For  ft  known,  then,  the  calculation  of  asymptotically 
efficient  estimators  of  (3,y)  is  straightforward.   But  the 

only  case  of  practical  interest  is  where  ft  (i.e.,  the  var- 

2      2 
iance  components  a  and  a    )  is  unknown  and  must  be  estimated. 

The  question  that  immediately  arises  is  how  ft  should  be 

estimated  when  the  only  concern  is  the  asymptotic  efficiency 

of  the  derived  estimators  of  ($,y):   Consider  the  equation 

/v_l/?         ~-1/?  A-l/2  *-1/2 

V      Yit  =  V      xit^  +  V      ziY  +  V      £it 

(4.6) 

where  Q  is  any  consistent  estimator  for  Q.      Then 
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Proposition  4.3-   For  any  consistent  estimator  ft  of  ft, 
least  squares  estimates  of  ($,y)  in  equation  (4.6)  have 
the  same  limiting  distribution  as  the  least  squares  esti- 
mates of  (3,y)  in  equation  (4.5),  based  upon  a  known  ft. 
Proof:   For  notational  convenience,  absorb  the  Z.y  into 
the  X..3.   We  shall  show  that  for  fixed  T,  /N[|(ft  )-£  (ft )]  I  0 
Adding  and  substracting  3,  we  can  write 


/in:8(6)-ea»]  =  |(|  x-r172?^-172^"1  |  x«ft-1/2w*(|  w*«w*)_1j 


x  _1  W*«ft-1/2(a.+n.,) 
•N  1   Xt 


-{(** 


^'n"172?^"17^)   |  x'ft~1/2w*(|  w*'w*) 


x  A  W*'fi"1/2(a,+n^) 


where  the  columns  of  the  TNx  (TN-N+k-,+g-,  )  matrix  W*  span  the 
column  space  of  W.   Since  ft  is  consistent  for  ft,  the  terms 
in  brackets  converge  in  probability  to  the  same  matrix. 
Expanding  ft    (a.+nit)  =  0a.  +  n..  -   QHj,  the  last  terms 
reduce  to 


/N  6 


—  W*  'a   -  —  W*'n 
N  w   ai    N  W   V 


1_ 

/N 


+  — -  W*'riit,   and 


/N  6 


—  W*'a   -  —  W*f n 
N  w   ai    N  w  V 


/N 


w*'n 


it 
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respectively.   Since  plim  ^  WK'ou  =  0,  and  plim  ^  W*'n1.  =  0, 
and  assuming  that  both  /N   (9-9)  and  —  W*'nn. +.  converge  in 
distribution  to  some  random  variable,  It  follows  that 


*>.       ^  y\ 


plim  /NC3C^)-3(«) )  =  0 
N->°° 


which  completes  the  proof. 

We  have  thus  shown  that  the  2SLS  estimators  of  the 
parameters  in  equation  (4.3)  -  using  any  consistent  esti- 
mator for  the  variance  components  -  are  asymptotically 
efficient.   These  estimators  coincide  with  the  LS  estimators  of 
3  and  y   in  equation  (4.6);  for  future  reference,  let  us 
denote  them  by  $*  and  y* . 
4.3  Special  Cases 

Depending  upon  the  degree  of  identification  of 
(3jY)  in  equation  (2.1),  the  consistent  and  asymptotically 
efficient  estimators  (0*,Y*)  exhibit  some  interesting 
peculiarities,  which  we  examine  below.   First,  to  establish 
some  terminology,  recall  the  order  condition  for  identifi- 
cation k-,  >_  g~  and  its  associated  rank  condition  in 
Proposition  4.1.   We  shall  refer  to  the  case  in  which 
these  conditions  hold  with  equality  as  just-identified; 
when  the  inequality  is  strict,  the  parameters  will  be  said 
to  be  overidentified. 
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Secondly,  we  shall  be  interested  In  estimating 
3  and  y   separately  from  equation  (4.6),  and  two  generic 
formulae  will  prove  convenient.   Let  Y  =  X,3-,  +  X~3p  +  e. 
Then 

Lemma:   The  following  two  expressions  for  the  LS  estimator 
of  3-,  are  identical: 

(i)   "parse  out"  Xp  by  premultiplying  by 

Q2  =  I  -  X^X^)""^  and  run  LS  on  QpY  = 

Q2X131  +  Q2e.   This  yields  31  =  CX1Q2X1)~1xJq2Y. 
(ii)   remove  the  LS  estimates  of  Xp3p  from  Y  and  regress 

that  on  X-,:   i.e.,  run  LS  on  Y  -  Xp3p  =  X-.3-,  +  e,. 

This  yields  B1  =  (X^Xir1X^i;Y-X2(X2Q1X2)~1X2Q1Y]. 
Proof:   The  first  statement  follows  immediately  from  the 
formula  for  a  partitioned  inverse.   The  second  expression 
can  be  derived  from  the  first  by  tedious  algebra,  and 
both  formulas  are  probably  well-known. 


Now,  suppose  the  parameters  in  equation  (2.1)  are 
underidentified.  Suppose 

(1)  k,  =  g,  =  0.   Here  there  are  no  exogenous  variables 
among  the  X. ,'s  or  the  Z.'s  and  the  set  of  instruments  is 
only  W  =  [Qv].   Since  fi"1/2  =  INT  -  Cl-6)PV,  P^"172  = 

A. 

Qy[INm-(l-6)Py]  =  Qv  in  this  case,  and  3*  is  exactly  the 
within-groups  estimator  3W  for  3.  For  the  general  underident- 
ified case, 
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(2)  k,  <  gp;  k,  >  0,  g  >  0.  Here,  the  instruments  are 

[Q-,r'X,  IZ,]  which  we  write  as  [QVIH]  for  convenience.  The  model 

is 

W.7)         Y*t  =  PwX*tB  +  PwzJY  +  e*t 


_l/p                 *  *        * 

where  Q         X. .  is  denoted  X. ,  etc.   Note  that  PTrZ.  =  PTT  Z. 

it            it  w  i    H  i 
*                      * 

QyZ  =  0.   When  k-,  <   gpS  ^-u^l  ^s  not  °^  ^u^1  column  rank, 


since  the  dimension  of  the  column  space  of  H  is  g-j,  +  k-, 
and  Z.  has  g,  +  g?  linearly  independent  columns.   Thus 
there  exists  a  g  vector  %   such  that  PyZ.?  =  0  and  the  g 
vector  Y  cannot  be  identified  since  Y  and  (y+£)  are  obser- 
vationally  equivalent  in  equation  (4.7).   To  calculate  3* 

we  "parse  out"  PtjZ.  in  equation  (4.7).   The  column  space 

* 

of  PttZ      is   precisely  the   column   space   of  H .  ;   projecting 

*  * 

PyX.,  onto  the  orthocomplement  of  PHZ.  yields  QyX..  .   Thus 

3*  in  the  generic  underidentified  case  is  the  within-groups 
estimator  By  and  there  is  no  consistent  estimator  for  Y- 

Suppose  the  parameters  of  (2.1)  are  identified.  Let 
(3)  k1   =  g2;  k1  >  0,  g2  >  0,  which  is  just-identified. 

JF 

Here  again,  the  rank  of  PHZ   equals  the  rank  of  H,  so  that 
3*  =  3W«   To  see  this  algebraically,  note  that  the  parse 
operator  in  (4.7)  is  INT  -  P  Z*(Z*'PHZ*)~1Z*' PR,  which 
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simplifies  to  INT  -  PH  since  H'Z*  and  Z*'H  are  square,  non- 
singular  matrices  in  the  just-identified  case.   Thus  since 

^NT'Wit  =  QVXit5  the  LS  ?stlmate  of  $  ln  (4.7),  is  6W. 

Now,  however,  PHZi  has  full  column  rank,  and  the  LS  estimate 

of  y  in  equation  (4.7)  is  identical  to  the  LS  estimate  of 
Y  in 


Yit  -  PAA  =  ViY  +  e* 


it 


by  the  previous  Lemma,  since  3W  is  the  LS  estimate  of  3  in 
equation  C  ^  •  7) .   Thus  y*  can  be  written  as 


Y«  =  CZt  PWZ.)  1Z±  V^it-VitW 


=  (Z,  PTrZ.)  XZ.  PTr(Y,  -X.  BTr)  =  (Z.PTrZ.)  xz.P  d.. 
i  Wi    l  Wi-   i«  W      1W1    l  w  it 


which  is  the  within-groups  estimator  of  y  defined  in  equation 
C2.7).   For  the  just-identified  case,  then,  our  2SLS  estimators 
coincide  with  the  within-groups  estimators  of  both  3  and  Y- 
If  the  parameters  are  overidentified,  the  within- 
groups  procedure  Is  no  longer  appropriate.   In  the  extreme, 
suppose  there  are  no  endogenous  variables. 

(4)  kp  =  gp  =  °>  which  is  overidentified.  Here,  the  set  of 
instruments  coincides  with  the  right  hand  variables  in  equa- 
tion (4.3),  so  that  the  2SLS  estimator  coincides  with  LS. 
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-1/2 

For  estimated  ft    ,  this  is  identical  to  the  GLS  procedure 

-1/2 
in  equation  (2.5);  for  known  ft    ,  it  is  Gauss-Markov. 

If  endogenous  variables  are  present,  consider 

(5)   k,  >  g2;  k2  >  0,  gp  >  0,  which  is  the  general  overiden- 

WZi 


* 

tified  model.   In  equation  (4.7),  the  column  rank  of  P,,Z 


is  now  g  and  the  column  space  of  PyZ.  no  longer  coincides 
with  that  of  H  .   Thus  £*  will  differ  from  3W  in  the  over- 
identified  case.   Since  y*  is  derived  from  the  regression 
of  Y  -  Xg*  on  PtyZjL*  y*  will  differ  from  yw»  which  we  derived 
from  the  regression  of  Y  -  XBW  on  PyZ.. 

Since  (B*,y*)  are  asymptotically  efficient,  ("w'^W 
are  inefficient  in  the  overidentified  case.   Intuitively, 
this  inefficiency  can  be  explained  by  regarding  the  within- 
groups  estimators  as  2SLS  estimators  which  ignore  the  instru- 
ments X-.  .      and  Z, .  .   It  is  a  peculiar  feature  of  this  model 
li*      li*  ^ 

that  ignoring  these  instruments  only  matters  when  the  para- 
meters are  overidentified. 

4. 4  Testing  the  Identifying  Restrictions 

More  efficient  estimates  of  6  and  consistent  estimates 
of  y  require  prior  knowledge  that  certain  columns  of  X. 

1  u 

and  Z.  are  uncorrelated  with  the  latent  a. .  An  important 
feature  of  our  model  .is  that  when  the  parameters  are  over- 
identified,  all  of  these  prior  restrictions  can  be  tested. 
This  is  an  extremely  unusual  and  useful  characteristic:  unusual 
in  that  it  provides  a  test  for  the  identification  of  y,  and 


-  39  - 


useful  since  the  maintained  hypothesis  need  contain  only 
the  relatively  innocuous  structure  o f  equation  (2.1).  It 
works,  basically,  because  B  is  always  identified  so  that 
By  provides  a  consistent  benchmark  against  which  all  (or 
some)  of  the  restrictions  in  equation  (2.2)  can  be  tested 
by  comparing  B  with  B*.  The  principles  of  such  tests  are 
outlined  in  Hausman  (1978)  and  extended  in  Hausman  and  Taylor 
(1980). 

Following  the  latter  analysis,  we  compare  our  efficient 
estimator  for  B  (which  uses  equation  (2.2))  with  the  within- 
group  estimator  (which  does  not  require  this  information  for 
consistency).  The  null  hypothesis  is  of  the  form 


HQ:  plim  i|x'lta.=0  and  plim  ^Z-  a  -:0. 
j\|->-oo  In"*100 

Under  H  ,  both  B„  and  B*  are  consistent,  while  under  the 

alternative,  plim  B*  ¥■   plim  By  =  B.  Thus  deviations  of 

q  =  B*  -  Bw  from  the  zero  vector  cast  doubt  upon  H  . 

To  form  a  x2  test  based  on  q,  premultiply  equation  (2.1) 

-1/2  -1     -1  /? 

by  Qzfi  '   =  [ITN  -  Z(Z'Z)   Z']ft  7    and  consider  the  within- 

groups  and  efficient  estimators  for  B  in  the  transformed 

equation.  Letting  X*  =  Qzft"1//2X, 

q  =  ia*'PwX*)_1X*'PwQz  -  (X*»QvX*)"1X*'QvQz]fi"1/2Y 

(4.8) 

=  DY* 


-•  i»0  - 


-1/2 
where  Y*  =  ft    Y  is  multivariate  normal  with  a  scalar 

covariance  matrix, and  mean  0  under  H„.  From  the  asymptotic 

Rao-Blackwell  argument  in  Hausman  (1978), 

Var(q)  =  Var(Bw)  -  Var(B») 

since  we  have  shown  that  6*  is  asymptotically  efficient 
under  H_.  From  equation  (4.8),  we  can  write  Var(q)  =  DD' . 
Since  q  is  being  used  to  test  k..  +g  restrictions  -  which 
may  be  bigger  or  smaller  than  its  dimension  k  -  the  rank  of 
the  kxk  matrix  DD'  and  the  degrees  of  freedom  for  the  x2 
test  may  be  less  than  k.  Following  Hausman  and  Taylor  (1980), 
Lemma:  Under  the  null  hypothesis,  q'(DD')  q  ^  x ^ >    where 

d  =  rank(D)  and  (DD')~  denotes  a  generalized 

inverse  of  the  covariance  matrix  of  q. 
Observing  in  equation  (4.8)  that  Q„  projects  onto  a  proper 
subspace  of  the  column  space  of  Py, 
Proposition  4.4:  Rank  (D) ■=  min  [k  -g  ,k,  TN-k]. 

Proof:  From  Hausman  and  Taylor  (1980), 

rank  (D)  =  min  [rank(X* • P  ) ,  rank(I-X*(X*'QvX* )_1X*' Qy)  ] 

since  P..  =  P„  +  Q.7.  Under  the  usual  linear  independence 
W    H    V 

assumptions,  the  second  term  in  brackets  equals  TN-k.  For  the 
first  term 

rank  (X*'PH)  =  min  [k,  rank  (QZ-PH)  ] 
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and  since  PR  =  PZPR  +  QZPH,  rank(QzPR)  =  (k1+g1)  -  (g1+g2) 
=  (k  -gp),  which  we  called  the  degree  of  overidentification 
in  the  previous  section. 

This  specification  test  of  the  restrictions  embodied 
in  equation  (2.2)  has  some  noteworthy  features.  The  number 
of  restrictions  nominally  being  tested  is  (k,+gO,  in  the 
sense  that  if  any  of  the  restrictions  in  (2.2)  is  false, 
q  should  differ  from  zero.  Yet  the  degrees  of  freedom  for 
the  test  depend  upon  the  number  of  overidentifying 
restrictions  (k-,-g~).  Moreover,  the  degrees  of  freedom  cannot 
exceed  the  dimension  of  3  (k)  or  the  degrees  of  freedom  in 
the  original  regression  (TN-k),  whichever  is  smaller .When 
the  model  is  just-identified,  3W  =  3*  (see  section  4.3); 
in  this  case,  the  degrees  of  freedom  are  zero  and  q  =  ^. 
Finally,  note  that  the  alternative  hypothesis  does  not 
require  that  any  of  the  columns  of  X  or  Z  be  uncorrelated 
with  a.;  hence  all  of  the  excgeneity  information  about 
X  and  Z  is  subject  to  test  by  this  procedure. 


This  test  compares  instrumental  variables  estimators  under 
two  nested  subsets  of  instruments:  3*  uses  [QVIX-.!Z,]   and 

3W  uses  [Qv] .  If  one  wished  to  test  particular  columns  of 

X,  and  Z,  for  correlation  with  a.  while  maintaining  a  particular 

set  of  identifying  assumptions,  a  test  -  similar  to  the 

above  -  can  be  constructed  by  comparing  (3*,Y*)  with 

(3W,YW)  where  yw  is  given  in  equation  (2.7).  For  details, 

see  Hausman  and  Taylor  (1980). 
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5.   Mundlak' s  Model 

A  final  special  case  is  the  model  discussed  at 
length  by  Mundlak  (1978),  in  which  no  time-invariant  obser- 
vables  are  present  and  all  explanatory  variables  are  cor- 
related with  a.: 

C5.D  Yit  «.xlte  +  a±   +  nit. 

The  relationship  between  a  and  X  is  expressed  by  Mundlak 
through  the  "auxiliary"  regression  a.  =  X.  it  +  ok  where  no 
prior  information  is  assumed  about  it.   Mundlak  shows  that 

(i)   if  a  is  correlated  with  every  column  of  X.  ,  (tt  is 

unconstrained),  the  Gauss-Markov  estimator  for  $ 

is  the  within-groups  estimator  3W,  -and 
Cii)   if  a.  is  uncorrelated  with  every  column  of  X. m 

(u  =  0),  the  G-M  estimator  for  3  is  the  GLS  esti- 
•  mator  $GLo  in  equation  (2.5)s  assuming  ft  to  be 

known. 

Recognizing  that  case  (i)  is  just-identified 
(k,=gp=0)  and  case  (ii)  is  overidentified  (kp=gp=0),  the 
discussion  in  (3)  and  (4)  above  shows  that  the  2SLS  estimator 
3*  is  identical  to  the  G-M  estimator  in  both  cases.   More  to 
the  point,  if  a.  is  uncorrelated  with  some  columns  of  X.   and 
correlated  with  others,  (tt  obeys  some  linear  restrictions), 
the  model  is  overidentified  (k,  >  gp  =  0)  and  case  (5)  above 
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A.  A, 

shows  that  3*  is  asymptotically  efficient  relative  to  3W- 
Thus  it  is  only  in  the  two  extremes  (i)  and  (ii)  that  3W  or 
3Grn  is  appropriate. 

We  can  use  this  characterization  of  the  G-M  estimator, 
however,  to  examine  the  relationship  between  3*  and  the  G-M  esti- 
mator, should  the  latter  exist.   Suppose  ft  is  known,  and  we  pre- 

-1/2 

multiply  Mundlak's  model    (5-1)   by  ft  and  re-parameterize   for 

convenience : 


>ft""1/2Yit   =  ft"1/2XltSS"13  +  ft   1/2«1  +  fi~1//2rllt 


C5.2) 


-1/2        * 

=  ft  1/2Mit?  +  elt 


where  M±t  =  X±tS,  I   =  S"13,  e±t  =   Qa±  +   n±t  -  (l-0)n±.  = 

#    * 
a,  +  x]..    and  the  non-singular  transformation  S  is  chosen 


so  that 


s-cx;txlt)s  =  ik. 


Since  the  X..  are  random  variables  in  the  analysis,  the 
matrix  S,  being  a  function  of  the  X.,  ,  will  be  random  also; 
since  some  X. .  are  endogenous  S  will  also  be  endogenous. 

Let  us  specify  prior  information  about  the  correla- 
tion between  X..  and  a  in  a  somewhat  more  flexible  manner 

than  Mundlak's.   Let  h  denote  the  k  vector  of  probability 

x 

limits  (for  fixed  T) 
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plim  |  X^a  E  hx  =  plim  |  S'^^j  =  S  \ 

JM->oo 

where  h„  denotes  the  corresponding  vector  of  (asymptotic) 
correlations  between  a.  and  M..  .   We  can  express  prior 
information  on  h  as  r  (r<k)  homogeneous  linear  restrictions 

-A. 

Rhx  =  0  =  RS_1Shx  =   R*hM 

which  yield  r  homogeneous  restrictions  on  h.,.   Note  that 

Ci)   the  exogeneity  information  in  equations  (2.2) 

can  be  expressed  as  Rh  =0  where  each  row  of  R 
^  x 

has  a  single  1  and  the  rest  zeroes; 

Cii)   the  previous  results  on  identification  and  esti- 

i 

mation  go  through,  taking  the  columns  of  X..R.  as 

1  0   1 

exogenous  where  R1  (i  =  1, . . . ,r)  is  a  row  of  R; 

(iii)   homogenous  restrictions  on  h  correspond  uniquely 

to  homogenous  restrictions  on  n  in  Mundlak's 

specification;  i.e.,  Rh  =  0  =>  plim  rr  R(X.  X.  ) 

T  ,  X    "     N-«  N         l1 

x(X,  X.  )  ■LX.  a,  =>  R  tt  =  0  where  R  =  R(X.  X.  ). 
i'i'    i-i  i-  !• 

In  the  model  (5.2),  then,  certain  linear  combina- 
tions of  the  columns  of  M.,  are  assumed  uncorrelated  with 
a.  and  all  of  the  columns  of  M.fc  are  orthogonal. 
Proposition  5.1:   The  2SLS  estimator  %*   in  equation  (5.*0  is 
Gauss-Markov  for  £. 
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Proof:   Let  F  denote  the  k*k  non-singular  matrix 

P  =  [R1 :B'] 

where  the  columns  of  B'  (kxk-r)  are  k-r  basis  vectors  for 
the  column  space  of  I.  -  R' (RR' )~  R.  Now,  reparameterize 
equation    (5.^)   as 


fi_1/2Yit   "   0"1/2MltPP"1C+   4t 


=  fi"1/2[MltR':MitB']F   \   +   e*t 


which  we  write  as 


C5.3)  n  1/2Ylt  =  n  1/2Liit6i  +  fi  1/2L2it<s2  +  e 


it 


where  6  =  [S^Sp]  =  F"1^.   Consider  2SLS  estimates  of  6  in 

equation  (5.3)s  using  as  instruments  W  =  [QylL..,  ]  since 

1   '    *        1    '   -1 
plim  rr  L  .  a  =  plim  jt  E  MI!  a  =  0  by  assumption.   By 

-1/2        -1/2 
construction,  ft    L-.  and  ft    L~  are  orthogonal,  and 

P^L,  =  L..,  so  the  2SLS  estimator 

"6i  ■  aiit°"\it>":iliit0"1,tt 


coincides  with  the  GLS  estimator  (for  known  ft).   It  is 
Gauss-Markov  for  <5,  in  this  model  since  all  columns  of  L, 


- ■  i|6  - 


are  uncorrelated  with  e..  and  Lp  is  orthogonal  to  L, . 
Similarly ,  the  2SLS  estimator  for  6?  is 

6*  =  ih29r1/2^r1/2h2)~1L^r1/2qY^~1/2Ylt 

-1/2 
since  PyLp  =  QvL?"   Since  Qyfi     =  Qy,  this  simplifies  to 

6?  =  (LpQ^Lp)   LpQ.Y  which  is  the  within-groups  estimator. 

Using  Mundlak's  result  (i)  above,  6„  is  G-M  for  6„  since 

every  column  of  Lp  is  correlated  with  a.,  and  Lp  is  ortho- 

gonal  to  L,  .   Hence  6*  =  [6   :6?  ]  is  G-M  for  6,  and  since 

F  is  a  non-singular,  non-stochastic  matrix,  £*  =  F<5*  is 

Gauss-Markov  for  F<5  =  %.      This  completes  the  proof. 

Two  related  questions  immediately  emerge.   First, 

is  3*  =  S£*  Gauss-Markov  for  3,  since  S  is  non-singular? 

Secondly,  what  became  of  the  intuition  that  2SLS  estimators 

were  biased  and  thus  not  Gauss-Markov? 

Proposition  5.2:   The  2SLS  estimator  3*  coincides  with  SC* 

but  3*  is  biased  for  3  and  not  a  Gauss-Markov  estimator. 

Proof:   Calculate  3*  directly  using  2SLS  in  the  model 


V1/2Yit  -  V"1/2xit3  +  V1/2eit 


where  W  =  [QV':L,  . .  ]  is  the  appropriate  set  of  instruments 
here,  as  well  as  in  equation  (5. 3).   Then 
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3*  =  CxitO"1^"172^3"^*0""172^"1721^ 


=  s[sx'fi"1/2Pwfi"1/2xs]~1sx'fi  1/2pw^  1/2- 


=  ss*. 

Thus  3*  is  a  non-singular  transformation  of  the  G-M  esti- 
mator  £*:   i.e. , 

3*  =  si*   and   3  =  S£ 

so  that  3*  -  3  =  SCl*-?).   However,  recall  that  S  is  a 
function  of  the  matrix  X. .  ;  it  is  endogenous  and  in  calcu- 
lating  moments  of  3*  -  3,  we  cannot  condition  on  it.   Hence, 
in  general,  E(3*-3)  =  ES(f*-£)  t   SE(f*-£)  -  0,  and 
cov[3*-3]  =  cov[S(f*-5)]  /   S[cov(t*-£)]S'  where  cov(5*-£) 
attains  the  Cramer-Rao  bound. 

A  final  anomalous  property  of  3*  follows  from 

these  propositions.   Suppose  the  original  design  matrix  X., 

i 

were  orthogonal,  so  that  X..X,,  =  I,.   Then  the  2SLS  estimator 

3*  using  [QyiX-.R']  as  instruments  would  be  both  unbiased 
and  Gauss-Markov.   One  rarely  finds  a  G-M  estimator  in  a 
simultaneous  equations  problem;  one  does  in  this  model 
because  2SLS  estimates  when  all  the  explanatory  variables 
are  correlated  with  a.  are  identical  to  the  within- 
groups  estimators,  and  these  are  unbiased  in  finite  samples. 
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To  see  this,  recall  that  the  set  of  instruments  in  this 
case  is  just  the  columns  of  Qv,  and  Qv  is  orthogonal  to 
a^  in  small  samples,  not  simply  as  a  probability  limit. 

6.  Estimating  the  Returns  to  Schooling 

In  this  section,  we  apply  our  estimation  and  testing 
techniques  to  a  returns  to  schooling  example.  This  problem 
has  received  extensive  attention  since  many  analysts  have 
felt  that  the  unobserved  individual  component  a  may  contain 
an  ability  component  which  is  correlated  with  schooling. 
Since  our  sample  does  not  contain  an  IQ  measure,  it  would 
seem  likely  on  a  priori  grounds  that  the  schooling  variable 
and  a.  are  correlated.  Yet  as  Griliches  (1977)  points  out, 
it  is  not  clear  in  which  direction  the  schooling  coefficient 
will  be  biased.  While  a  simple  story  of  positive  correlation 
between  ability  and  schooling  leads  to  an  upward  bias  in  the 
OLS  estimate  of  the  schooling  coefficient,  a  model  in  which 
the  choice  of  the  amount  of  schooling  is  made  endogenous 
can  lead  to  a  negative  correlation  between  the  chosen  amount 
of  schooling  and  ability.  In  fact,  both  Griliches  (1977)  and 
Griliches,  Hall,  and  Hausman  (1978)  find  that  treating  schooling 
as  endogenous  with  family  background  variables  as  instruments 
leads  to  a  rise  in  the  estimated  schooling  coefficient  of  about 
50%.      Thus,  we  would  like  to  investigate  how  our  estimation 


7 

Using  a  specification  test  of  the  type  Wu  (1973)  and  Hausman 

(1978)  propose,  we  find  a  statistically  significant  difference 
between  the  IV  and  OLS  estimates.  Chamberlain  (1978)  also  finds 
a  significant  increase  in  the  schooling  coefficient  when  he 
compares  OLS  estimates  with  estimates  from  his  two  factor  model 
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method  affects  the  return  to  schooling  coefficient,  since 
we  do  not  require  excluded  family  background  variables  to 
serve  as  instruments,  as  did  the  previous  estimates. 

Our  sample  consists  of  750  randomly  chosen  (non-SEO) 
prime  age  males,  age  25-55 ,    from  the  PSID  sample.  We  consider 
two  years,  1968  and  1972,  to  minimize  problems  of  serial 

o 

correlation  apart  from  the  permanent  individual  component . 
The  sample  contains  70  non-whites  for  which  we  use  a  0-1 
variable,  a  union  variable  also  treated  as  0-1,  a  bad  health 
binary  variable,  and  a  previous  year  unemployment  binary 

variable.  The  two  continuous  explanatory  variables  are 

9 
schooling  and  either  experience  or  age.   The  PSID  data  does 

not  include  IQ.  The  NLS  sample  for  young  men  would  provide 

an  IQ  measure,  but  problems  of  sample  selection  would  need  to 

be  treated  (as  in  Griliches,  Hall,  and  Hausman  (1978))  which 

would  cause  further  econometric  complications.  Perhaps  of  more 

importance  is  the  fact  that  for  the  NLS  sample,  10  has  an 

extremely  small  coefficient  in  a  log  wage  specification, 

(e.g.,  between  .0006  and  .002  in  Griliches,  Hall,  and  Hausman 

■J5 

Lillard  and  Willis  (1978)  demonstrate  within  a  random  coefficients 
framework  that  a  first  order  autoregressive  process  remains 
even  after  the  permanent  individual  effect  is  accounted  for. 
Our  estimation  technique  can  easily  be  extended  to  test  for  an 
autoregressive  process,  but  here  we  use  a  simpler  case.  Note 
that  we  are  not  investigating  the  dynamics  of  wages  or  earnings 
here. 

9 

Experience  was  used  as  either  experience  with  present  employer 

or  a  measure  of  age  -  schooling  -  5-  Qualitatively,  the  results 
are  similar,  so  we  report  results  using  the  latter  definition. 
As  the  results  show,  use  of  age  also  yields  very  similar  results 
for  the  schooling  coefficient.   Unlike  Griliches  (1977) ,    we  are 
not  attempting  to  separate  out  the  influence  of  age  from  experience 
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(1978));  and  if  it  is  included  in  the  specification,  it  has 
only  a  small  effect  on  the  schooling  coefficient.  Thus  we 
use  the  PSID  sample  without  an  IQ  measure,  although  our 
results  should  be  interpreted  with  this  exclusion  in  mind. 

We  now  consider  the  estimation  method  proposed  in  Section 
4.2  from  the  standpoint  of  computational  convenience.  Equation 
(4.5)  and  Proposition  4.3  state  the  basic  theoretical  results. 
Given  initial  consistent  instrumental  variables  estimates  of 
(B,y)j  we  can  estimate  0,   and  transform  the  variables  by  9- 
differencing  the  data.  The  model  now  is  of  the  form  of  equation 
(4.6),  and  OLS  estimates  will  be  asymptotically  efficient. 

The  main  difficulty  that  arises  is  computational:  how  to 

do  instrumental  variables  when  the  data  matrix  (of  order  TxN) 

may  exceed  the  computational  capacity  of  much  econometric 

software.  If  this  occurs,  using  equation  (4.5) ,    calculate 

predicted  values  of  X?  and  Z~  from  their  reduced  forms.  The 

predicted  Z?.'s  are  formed  from  a  sample  size  N  regression 

of  Z„.  on  the  columns  of  X,.   and  Z.,  .  .  For  the  X~.  's,  rather 
2i  li.      li  2it   ' 

than  doing  a  sample  size  TxN  regression,  an  equivalent  procedure 
is  to  form  X_.,  =  X  .   -  X_.   +  X0.  .  The  last  term,  X  .  ,  is 

ulL         ell         d  1  «         cl  «  C.  X  « 

calculated  from  the  sample  size  N  regression  of  X    on  X..  . 

CL   X  o  XX* 

and  Z   .   Then  the  calculated  X_.,  and  Z   are  used  with  the 
X, ..  and  the  Z   in  an  OLS  regression  to  obtain  consistent 
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estimates  of  both  3  and  y-   A  similar  technique  works 
with  the  transformed  variables  in  equation  (4.6)  which 
yields  asymptotically  efficient  estimates  of  3  and  y. 
The  reason  that  calculating  X  .   in  this  manner  is  equivalent 

tit 

to  the  more  cumbersome  approach  of  a  TxN  sample  regression 
of  X?..  on  instruments  as  indicated  in  equation  (4.4)  is  that 
Qv  is  orthogonal  to  any  time-invariant  variable.  Thus  parsing 
out  Q„  in  the  second  and  third  equations  of  (4.4)  is 
equivalent  to  premultiplying  them  by  Pv,  and  Xp .   and  Z 
can  be  calculated  from  the  sample  size  N  regressions  on 
X,  .   and  Z   .  To  get  X?..  ,  we  must  add  Qytf   to  X_.  ,  so  that 
X2±t  is  given  by  X2±t  +  ±2±m- 

If  computational  capacity  is  not  a  difficulty,  a  standard 
instrumental  variables  package  can  be  used,  with  X    ,  X    , 
X?..,  and  Z  .  as  instruments.  The  variables  which  are  time 
invariant  have  T  identical  entries  for  each  individual  i.  So 
long  as  Proposition  4.1  is  satisfied,  the  parameters  are 
identified  and  the  number  of  columns  of  X,  .   is  at  least  as 
great  as  the  number  of  columns  of  Z_   (i.e.,  k  >_  g?).  Note 

again  how  the  columns  of  Xn..  serve  two  roles:  both  in 

°  lit 

estimation  of  their  own  coefficients  and  as  instruments  for 
the  columns  of  Z  . . 


One  note  of  caution,  however.  The  estimates  of  the  variance 
from  the  second  stage  are  inconsistent,  for  the  same  reason 
as  doing  2SLS  in  two  steps  yields  inconsistent  variance 
estimates  in  the  second  step.  To  estimate  the  variances 
consistently,  one  must  use  the  estimated  coefficients  and 
the  model  (2.1)  without  the  hatted  variables  on  the  right 
hand  side. 
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We  now  turn  to  our  log  wage  regressions  to  determine 
the  effects  on  the  schooling  coefficient  of  our  estimation 
procedure.  Column  1  of  Table  6.1  gives  the  OLS  results  while 
Column  2  gives  the  GLS  estimates  under  the  assumption  of  no 
correlation  between  the  explanatory  variables  and  a. .  The 
OLS  and  GLS  estimates  are  reasonably  close,  especially  the 
schooling  coefficient  which,  in  both  cases,  equals  .067- 
The  effects  of  experience  and  race  stay  the  same,  while  the 
remaining  three  coefficients  change  somewhat,  though"  they  are 
not  estimated  very  precisely .•  Note  that  the  correlation 
coefficient  across  the  four  year  period  (p  =  .638)  indicates 
the  importance  of  the  unobserved  individual  effect.  The  finding 
that  an  additional  year  of  schooling  leads  to  a  6.7%   higher 
wage  is  very  similar  to  other  OLS  results,  both  on  PSID  and 
other  data  sets. 

In  the  third  column  of  Table  6.1,  we  present  the  within- 
groups  estimate  of  the  wage  equation  specification.  All  the 
time  invariant  variables  are  eliminated  by  the  data  trans- 
formation, leaving  only  experience,  bad  health,  and  unemployed 
last  year.  As  we  have  seen,  the  estimates  of  these  coefficients 
are  unbiased  even  if  the  variables  are  correlated  with  the 
latent  individual  effect.  The  coefficient  estimates  change 
markedly  from  the  first  two  columns.  The  effect  of  bad 
health  falls  by  26%,    the  effect  of  unemployment  falls  by  3^%, 
while  the  effect  of  an  additional  year  of  experience  rises 
by  59%.  Comparing  the  within-groups  and  GLS  estimates,  using 
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Table 

6.1    DEPENDENT 

VARIABLE: 

LOG  WAGE 

OLS* 

GLS 

Within 

IV/GLS 

1.  Exp 

+.0132 
(.0011) 

.0133 
(.0017) 

.0241 
(.0042) 

.0175 
(.0026) 

2.  Race 

-.0853 
(.0328) 

-.0878 
(.0518) 

- 

-.0542 
(.0588) 

3.  Bad  Health 

-.0843 
(.0412) 

-.0300 
(.0363) 

-.0388 
(.0460) 

-.0249 
(.0399) 

4.  Unemp  Last  Yr 

-.0015 
(.0267) 

-.0402 
(.0207) 

-.0560 
(.0295) 

-.0636 
(.0245) 

5.  Union 

+.0450 
(.0191) 

.0374 
(.0296) 

- 

.0733 
(.0434) 

6.  Yrs  School 

+.0669 
(.0033) 

.0676 
(.0052) 

- 

.0927 
(.0191) 

Other  Variables 

Constant 
Time 

Constant 
Time 

- 

Constant 
Time 

NOBS 

1500 

1500 

1500 

1500 

S.E.R. 

.321 

.192 

.160 

.193 

RHO 

.623 

• 

Instruments 

Dad's  Educ 
Poor 
Mom's  Educ 

*  Reported  standard  errors  are  inconsistent  since  they  do  not  account  for  variance 
components. 
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results  in  Hausman  (1978),  we  test  the  hypothesis  that  some 

of  the  explanatory  variables  in  our  log  wage  specification 

are  correlated  with  the  latent  a..  Under  the  null  hypothesis, 

the  statistic  is  distributed  as  x\,   and  since  we  compute 

m  =  20.2,  we  can  reject  the  null  hypothesis  with  any  reasonable 

size  test.  This  confirms  Hausman' s  (1978)  earlier  finding  that 

mis-specification  was  present  in  a  similar  log  wage  equation. 

In  the  last  column  of  Table  6.1,  we  present  traditional 

instrumental  variables  estimates  of  the  wage  equation,  treating 

schooling  as  endogenous.  Family  background  variables  are  used 

as  additional  instruments:  father's  education,  mother's 

education,  and  a  binary  variable  for  a  poor  household.  The 

estimated  schooling  coefficient  rises  to  .0915,  which  echoes 

previous  results  of  Griliches  (1977)  and  Griliches,  Hall, 

and  Hausman  (1978)  who  find  an  increase  of  an  almost  identical 

amount.  Under  the  null  hypothesis  that  the  instruments  are 

uncorrelated  with  a.,  the  estimated  coefficients  should  be 

l 

about  the  same.  Note  that  the  instrumental  variables  estimates 
are  somewhat  closer  to  the  consistent  within-groups  estimates 
than  the  original  OLS  estimates.  We  might  conclude  that  the 
instruments  have  lessened,  the  correlation  of  schooling  with 
a.  by  replacing  schooling  with  a  linear  combination  of 
background  variables  serving  as  instruments.  Yet  the  result 
of  the  specification  test  is  m  =  8.70  which  again  indicates 
the  presence  of  remaining  correlation  between  the  instruments 


-  54  - 


and  the  latent  Individual  effects.  We  conclude  that  family 
background  variables  are  inappropriate  instruments  in  this 
specification,  perhaps  because  unmeasured  individual  effects 
may  be  transmitted  from  parents  to  children. 

In  the  first  two  columns  of  Table  6.2,  we  present 
the  results  of  our  estimation  method.  We  assume  that  X 
contains  experience,  bad  health,  and  unemployment  last  year, 
all  initially  assumed  to  be  uncorrelated  with  the  individual 
effect.  Z,  is  assumed  to  contain  race  and  union  status,  while 
Zp  contains  schooling,  which  is  assumed  to  be  correlated  with 
a..  The  estimated  schooling  coefficient  rises  to  . 125  s  which 
is  62$  above  the  original  OLS  estimate  and  32%   above  the 
traditional  instrumental  variables  estimate.  Also,  note  that 
the  effect  of  race  has  now  almost  disappeared:  its  coefficient 
has  fallen  from  -.085  in  the  OLS  regression  to  -.028.  The 
effects  of  experience  and  union  status  have  risen  substantially, 
while  that  of  bad  health  has  fallen. 

Using  the  test  from  Section  4.4,  we  compare  the  within- 
groups  and  efficient  estimates  of  the  X  coefficients.  Observe 
that  the  unemployment  coefficient  is  now  very  close  to  the 
within  estimate,  while  bad  health  and  experience  have  moved 
considerably  closer  to  the  within-groups  estimates  from  either 
the  OLS  or  instrumental  variables  estimates.   The  test  statistic 
is  m  =  2.24  which  is  distributed  as  Xp  under  the  null  hypothesis 
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Table  6.2        Dependent  Variable:   Loci  Wage 


HT/IV 


HT/GLS 


HT/GLS 


HT/GLS 


HT/GLS 


1.  EXP 

.0217 
(.0027) 

.0217 
(.0031) 

- 

.0268 
(.0037) 

.0241 
(.0045) 

2.  EXP2 

- 

- 

- 

-.00012 
(.00015) 

- 

3.  AGE 

- 

- 

.  .0147 

(.0028) 

- 

- 

4.  RACE 

-.0257 
(.0531) 

-.0278 
(.0758) 

-.0046 
(.0824) 

-.0014 
(.0662) 

-.0175 
(.0764) 

5.  BAD  HEALTH 

-.0535 
(.0468) 

-.0294 
(.0307) 

-.0228 
(.0318) 

-.0243 
(.0318) 

-.0388 
(.0348) 

6.  UNEMP  LAST  YEAR 

-.0556 
(.0311) 

-.0559 
(.0246) 

-.0634 
(.0265) 

-.0634 
(.0236) 

-.0560 
(.0279) 

7.  UNION 

.1245 
(.0560) 

.1227 
(.0473) 

.1648 
(.0721) 

.1449 
(.0598) 

.2240 
(.2863) 

8.  YRS  SCHOOL 

.1247 
(.0380) 

.1246 
(.0434) 

.1311 
(.0490) 

.1315 
(.0319) 

.2169 
(.0979) 

OTHER  VARIABLES 

Constant 
Time 

Constant 
Time 

Constant 
Time 

Constant 
Time 

Constant 
Time 

MOBS 

1500 

1500 

1500 

1500 

1500 

S.E.R. 

.352 

.190 

,196 

,189 

,629 

RHO 


.661 


,678 


,674 


,817 


*  Reported  standard  errors  are  inconsistent  since  they  do  not  account  for 
variance  components. 
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of  no  correlation  between  the  explanatory  variables  and  a.  . 
While  m  is  somewhat  higher  than  its  expected  value,  2.0 
under  Hn,  we  would  not  reject  the  hypothesis  that  the  columns 
of  X,  and  Z-,  are  uncorrelated  with  the  latent  individual 
effect. 

We  next  present  some  additional  results  to  see  how 
robust  our  estimates  are  to  specification  change.  Column 
3  of  Table  6.2  replaces  experience  with  age.  While  experience 
is  arguably  correlated  with  a.  through  its  schooling  component, 
age  can  be  taken  as  uncorrelated,  unless  important  cohort 
effects  cause  correlation.  The  results  are  quite  similar  to 
our  previous  findings.  The  effect  of  schooling  is  .120, 
only  slightly  lower  than  the  .125  found  previously.  Race 
again  has  little  or  no  effect,  while  the  effects  of  bad 
health  and  unemployment  are  similar  to  those  in  the  specification 
with  experience.  In  the  next  column  of  Table  6.2,  we  include 
both  experience  and  experienced  squared  as  explanatory 
variables.    Again,  the  results  are  quite  similar  to  the 
original  specification.  The  schooling  coefficient  increases 
from  .125  to  .132,  and  race  still  has  little  effect.  We 
conclude  that  our  main  results  are  reinforced  by  these 


Neither  of  these  alternative  specifications  of  age  or  both 
experience  and  experience  squared  pass  the  specification 
test  if  estimated  by  OLS  and  compared  with  the  appropriate 
within-groups  estimates.  In  both  specifications,  the 
latent  individual  effects  continue  to  be  correlated  with  the 
explanatory  variables. 
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alternative  specifications. 

Our  last  specification  relaxes  the  correlation  assumptions 
among  the  explanatory  variables.  We  now  remove  experience 
and  unemployment  from  the  X,  category  to  the  X~  category, 
permitting  them  to  be  correlated  with  ex..  Now  X,  contains 
only  bad  health.  The  model  is  just-identified,  so  that  the 
efficient  estimates  of  the  coefficients  of  the  X..  variables 
are  identical  to  the  within-groups  estimates.  The  speci- 
fication test  of  section  4.4  has  zero  degrees  of  freedom 
and  no  specification  test  can  be  performed.  The  asymptotic 
standard  errors  have  now  risen  to  the  point  where  coefficient 
estimates  are  quite  imprecise,  especially  the  schooling 
coefficient  estimate.  Nevertheless,  it  is  interesting  to 
note  that  the  point  estimate  of  the  schooling  coefficient 

c 

has  risen  to  .217.  Thus  all  our  different  estimation  methods 
T   .ve  led  to  an  increase  in  the  size  of  the  schooling-      * 
coefficient.  Removing  potentially  correlated  instruments 
has  had  a  substantial  effect:  the  point  estimates  change 
and  their  standard  errors  increase.  All  methods  which  control 
for  correlation  with  the  latent  individual  effects  increase 
the  schooling  coefficient  over  those  which  do  not;  and  this 
is  certainly  not  the  direction  that  many  people  concerned 
about  ability  bias  would  have  expected. 

In  this  paper,  we  have  developed  a  method  for  use 
with  panel  data  which  treats  the  problem  of  correlation 
between  explanatory  variables  and  latent  individual  effects. 
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Making  use  of  time-varying  variables  in  two  ways  -  both 
to  estimate  their  own  coefficients  and  to  serve  as 
instruments  for  correlated  time-invariant  variables  -  allows 
efficient  estimation  of  both  8  and  y.  The  method  is  a  two- 
fold improvement  over  the  within-groups  estimator:  it  is 
more  efficient  and  also  produces  estimates  of  the  coefficients 
of  time-invariant  variables.  It  also  appears  to  be  better 
than  traditional  instrumental  variables  methods  which  rely 
on  excluded  exogenous  variables  for  instruments.  Perhaps 
most  important  is  the  fact  that  in  the  overidentified  case 
(k-,  ^_  gp)s  a  specification  test  exists  which  allows  a  test 
of  the  appropriateness  of  the  instruments.  Since  the  within- 
groups  estimates  of  g  always  exist,  they  provide  a  baseline 
against  which  further  results  -  using  the  information  in 
equations  (2.2)  -  can'-be  compared.  If  this  specification 
test  is  satisfied,  we  can  be  confident  in  the  consistency 
of  our  final  results,  since  the  maintained  hypothesis 
embodied  in  the  within-groups  estimator  is  so  weak. 
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